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Abstract — Joint saliency map (JSM) |1| was developed to 
assign high joint saliency values to the corresponding saliency 
structures (called Joint Saliency Structures, JSSs) but zero or 
low joint saliency values to the outliers (or mismatches) that are 
introduced by missing correspondence or local large deformations 
between the reference and moving images to be registered. JSM 
guides the local structure matching in nonrigid registration by 
emphasizing these JSSs' sparse deformation vectors in adaptive 
kernel regression of hierarchical sparse deformation vectors 
for iterative dense deformation reconstruction. By designing 
an effective superpixel-based local structure scale estimator to 
compute the reference structure's structure scale, we further 
propose to determine the scale (the width) of kernels in the 
adaptive kernel regression through combining the structure scales 
to JSM-based scales of mismatch between the local saliency 
structures. Therefore, we can adaptively select the sample size of 
sparse deformation vectors to reconstruct the dense deformation 
vectors for accurately matching the every local structures in 
the two images. The experimental results demonstrate better 
accuracy of our method in aligning two images with missing 
correspondence and local large deformation than the state-of- 
the-art methods. 

Index Terms — nonrigid registration, structure scale, mismatch 
scale, kernel scale, joint saliency map, outliers, missing corre- 
spondence, local large deformation, kernel regression 



I. Introduction 

NOnrigid image registration [2 1 is a procedure to minimize 
the difference between one (reference) image and another 
(moving) image by spatially aligning every corresponding lo- 
cal structures. In early detection of pathology, correctly match- 
ing corresponding local small structures is especially important 
to identify differences in morphology which are distinctive 
between pathological and heathy states. However, owing to 
the the outliers introduced by missing correspondence, such 
as the tumor appearing in preoperative image but not the 
intraoperative image, and/or local large deformations in the 
two images to be registered, robustly determining the accurate 
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one-to-one correspondence between local structures is still a 
challenging task. Especially, these missing correspondences 
of local structures are always accompanied by the local large 
deformations. 

These outliers exhibit large structural discrepancies in the 
local varying spatial context, where the mismatches between 
the local structure pairs could be so complex that the sur- 
rounding local structures could distort in very different ways. 
Due to the outlier presenting these various local differences, 
the local model [3||4]|5| based methodology that can deal 
with the locally varying difference are considered as an 
ideal methodology to account for these outliers. To tackle 
these outlier problems, the image registration methods that 
are classified into intensity- and feature-based registration 
methodologies have seen various efforts in recent years. Using 
local model of sparse image representation to select some 
corresponding features of two images, feature-based registra- 
tion can be considered as local model based registration to 
directly matching the local structures by finding a geometric 
transformation from these sparse feature correspondences. 
However, the computation of registration is still sensitive to 
the false correspondences in the outliers. Recently, a Bayesian 
regression model [6| successfully infer the continuous and 
locally smooth transformation for registering challenging 2D 
point sets and was favorably compared with state-of-the-art 
methods |7||8]|9| both in cases of noise and outliers. This 
work justifies that local model (such as regression model) are 
favorable to model registration transformation compared with 
other interpolation based techniques. 

We regard most intensity-based registration as global model 
based registration that is often formulated as a global energy 
minimization problem with the energy being composed of 
an regularization term and a similarity term. Due to the 
global model driven whole-intensity similarity being unable 
to represent the similarity of the local structures, the outlier 
problems were only partially solved by using a locally varying 
weight between regularization and similarity ifTUI . creating ar- 
tificial correspondence ifTTl . or removing the outliers with cost- 
function masking [12]. These approaches either are largely 
dependent on the outlier segmentation or not automatically 
tackle the missing correspondence and local large deforma- 
tions simultaneously. 

Our previous work proposed a joint saliency map (JSM) 
[T) to highlight the corresponding saliency structures (called 
joint saliency structures, JSSs) in the two images, and em- 
phatically group those JSSs in the weighted joint histogram 
computation for the automatic rigid intensity-based registration 
of challenging image pairs with outliers. After getting the 
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sparse deformation vectors of moving image in a hierarchical 
block matching, we further use the JSM to emphasize these 
JSSs' sparse deformation vectors in the JSS adaptive kernel 
regression for automatically reconstructing dense deformation 
vectors in intensity -based nonrigid image registration [13|. 
The local structures' registration (deformation) accuracy in 
local estimates is mainly dependent on the shape/size of 
the neighborhood deformation vectors and/or the estimation 
weights used for local estimation. Our JSS adaptive kernel 
regression adapts the kernel function's shape and orienta- 
tion to the reference image's local saliency structure, more 
displacement vector samples belonging to the same local 
structure are grouped together so that the regression of local 
deformation can accord with the local saliency structures in 
the reference image. In addition, JSM highlights the weights 
of JSSs at the moving windows/kernel in reconstructing local 
dense deformation vectors while suppressing outlier effects in 
the regression. 

However, our method still could not accurately describe the 
deformation in some local small structures because the win- 
dows size for the kernel regression is fixed. The scale (width) 
of moving window/kernel determines the sample size of sparse 
displacement vectors participating in the kernel regression 
and therefore controls the amount of deformation smoothing 
introduced by the local approximation. A small scale means 
a small window and corresponds to noisy estimates, less 
biased, and with high variance. Comparatively, a large scale 
corresponds to a large window and therefore to smooth defor- 
mation estimates, with low variance and typically increased 
estimation bias. Thus, the local scale of kernel regression 
controls the trade-off between the registration accuracy and the 
smoothness of the local deformation field. The optimal choice 
of kernel scale depends on the mismatch degree (registration 
inaccuracy) and structural scale of underlying local structures 
to be matched. For large structures and large mismatches, 
we would like the kernel scale to be large to reduce the 
registration (or deformation) variance. For small structures and 
small mismatches, a small kernel scale is desirable in order 
to reduce the registration bias error. Therefore, the local scale 
of kernel regression should be adaptively proportional to the 
structure scale and the mismatch degree (scale) of underlying 
saliency structures to be registered. 

With the above-mentioned observations in mind, we propose 
a new method which has three contributions. 1) We propose 
mismatch scale into the nonrigid image registration by using 
JSM, whereby we could judge the registration inaccuracy 
in the local structure matching. 2) We design a simple 
but effective superpixel based local structure scale estimator, 
which first segments the reference image into multi-resolution 
superpixel [ 30 1 [ 1 8 1 structural regions and then calculate the 
structure scales of Gaussian smoothed superpixel regions in 
terms of variance in a scale-space framework through the 
minimal description length criterion (MDL) 111910351 . 3) We 
introduce a local kernel scale selection scheme by conflating 
the mismatch scale with the superpixel based structure scale, 
and apply it to our previous nonrigid image registration 
using JSS adaptive kernel regression. By integrating this local 
scale selection scheme into multi-resolution adaptive kernel 



regression, the nonrigid registration can iteratively guide the 
deformation of each local structure towards the well-aligned 
position and orientation. Therefore, we can achieve more 
accurate local structure matching in small structures and 
maintain a smooth deformation field around local structures. 
The proposed method is elaborated in Section 2 followed 
by experimental results in Section 3. The whole paper is 
concluded in Section 4. 

II. Background and Related Works 
A. Joint Saliency Structure Adaptive Kernel Regression 

Inspired by the success of local approximation by kernel 
regression (or nonparametric regression) for signal recon- 
struction, we consider the nonrigid image registration as a 
local adaptive kernel regression by iteratively reconstructing 
dense deformation vectors from the sparse deformation vectors 
obtained through hierarchical block-matching. After Suarez et 
al. lPT4l used the normalized convolution 1 15 1 to estimate dense 
deformation field from sparse deformation field, two recent 
works [16 1 1 17] also utilized kernel regression to estimate 
registration transformation. However, these methods did not 
exploit the local adaptivity of kernel regression for the nonrigid 
image registration with outliers. 

Suppose we have sparse and irregularly distributed defor- 
mation vectors {yi,*-i}f = i given in the form 

yi = z(xi)+e i; x; e O, i = l,---,P (1) 

where the y,; is a sparse displacement vector (response vari- 
able) at position (explanatory variable) x^, z (•) describes 
the desired dense deformation field in the moving windows 
(kernel) ft with independent and identically distributed zero 
mean noise = e (x^). In statistics, the function z(-) is 
treated as a regression of y on x, z (x) = E {y \ x}. In this 
way, the reconstruction of nonrigid deformation field is from 
the field of the regression techniques. 

Importantly, our JSS adaptive kernel regression has first 
proposed two local adaptivity in selecting local kernel's shape 
and the JSS-based weights within moving kernel for local 
estimation. The workflow of JSS adaptive kernel regression 
combined with kernel scale selection for nonrigid registration 
is illustrated in Fig. 1, where different levels have their own 
resolution but the same procedure. At each level, the resulted 
deformation is composed of initial deformation and current 
deformation. The proposed method consists of an iterative 
scheme, which at each iteration alternates between the block 
matching and JSS adaptive kernel regression with local scale 
calculation. Firstly, we learn the underlying characteristics 
of sub-blocks' similarities to get roughly registered moving 
image's sparse displacement vectors. Then we compute the 
JSM of two images to highlight the locally JSSs and es- 
timate the local scale of mismatch between the underlying 
saliency structures for subsequent kernel regression. Further- 
more, we estimate every reference structure's orientation to 
design anisotropic kernel, and conflate the reference structure's 
structure scale with the mismatch scale for the selection 
of kernel scale. Finally, with a moving window/kernel in 
kernel regression, the output dense deformation vectors are 
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Reference Moving 
Image A Image B 

_J L_ 

Block Matching in the First Coarse Level 



Block Matching in the Last Fine Level 

I I 

Resulted Registed Moving 

Deformation Field Image B 

Fig. 1 . Flowchart of our algorithm in a coarse-to-flne framework 

estimated based on an emphatical weighting of the JSSs' 
sparse deformation vectors within the moving kernel with local 
adaptive scale (window size). Compared with our previous 
works lfl3l . the proposed local adaptive scale selection for 
kernel regression is displayed at the module within the red 
dashed line in Fig. 1. 

B. Local Scale Selection 

The idea of scale (window size, or bandwidth) selection is 
not new for kernel regression (or nonparametric regression). 
The scale parameter controls the trade-off between bias and 
variance in the local estimation of kernel regression. The are 
two types of approaches which have been reported over the last 
decades for scale selection in kernel regression Bl 11221 . One 
is the plug-in methods which calculate the ideal scale by esti- 
mating the bias and the variance under the mean squared error 
(MSE) between the real signal and its approximation [18|[19|. 
Alternatively, the goodness-of-fit methods [20] [21] [22] are 
widely used as data-driven methods without the bias estimates. 
These methods choose scale based on the accuracy criteria, 
with the main goal to achieve an optimal accuracy balancing 
the bias and the variance of estimation. 

Because the kernel scale for JSS adaptive kernel regression 
are used to minimize the differences between the local saliency 



structures in the two images by selecting appropriate size of 
the neighborhood sparse deformation vectors, it is reasonable 
that the kernel scale should be consistent with the degree 
of mismatch (or local deformation) between the underlying 
saliency structure pairs and the scale of the saliency structures 
to be registered. Therefore, the kernel scale is adaptively 
estimated by combining the local mismatch scale with the 
structure scale of underlying saliency structures. Due to the 
goodness-of-fit methods having capability for scale selection 
for image intensities and their various derivatives, we use this 
accuracy-based scale selection to estimate the kernel scale for 
JSS adaptive kernel regression. 

To derive some accuracy criteria for selecting the local 
mismatch scale, we should first automatically determine the 
local registration inaccuracy (or registration uncertainty) dur- 
ing the nonrigid image registration procedure. Recently, locally 
evaluating the intensity-based nonrigid registration inaccuracy 
is a special subject that obtained increase research concerning 
(see [23|-[28| and references therein). Assuming the transfor- 
mation parameters follow some prior statistical distributions, 
most of these methods search the entire image to infer the 
distribution of probable registration transformation. Due to the 
outliers presenting locally varying differences in the spatial 
contexts, this global model based prediction of registration 
transformation would have its limitations in accurately match- 
ing the local structures. Being different from these methods, an 
iterative JSM-based local strategy is presented, which is able 
to represent the degree of matching (or mismatch) of local 
saliency structures by computing the local similarity measures 
between the saliency -based [29] appearance distributions on 
pixel pairs in the two images. 

In fact, the JSM ifTl lTHll assigns high joint saliency values to 
the JSSs but zero or low joint saliency value to the mismatch 
structures (or regions) in the two images, whereby JSM is 
an ideal tool to indicate the local registration inaccuracy. 
Specifically, the moving saliency structures that require large 
deformations to be matched with reference structures are in 
the noncorresponding regions rather than the JSS regions, 
therefore their joint saliency values are low in JSM and 
the local kernel should be expanded to gather more sparse 
displacement vector samples for the right deformation. Rel- 
atively, the moving saliency structures that require zero or 
small deformation are in the JSS regions, where the joint 
saliency values are high in JSM and a small kernel can already 
determine the well-aligned deformation accuracy. Therefore, 
we adopt the JSM to design the mismatch scale for selecting 
kernel scale for JSS adaptive kernel regression. 

We further investigate the selection of structure scale in the 
multi-resolution block matching based registration framework, 
which is preferred for modeling local large deformation of 
local saliency structures. Being a compact representation of 
an original image in low resolutions, large saliency structural 
regions initialize the sparse deformation vectors while the 
small local structures gradually refine the sparse displacement 
vectors with the increasing resolution of images. At the same 
resolution level, matching the local large structures could use 
large kernel scale to reduce the deformation variance (or 
increase deformation smoothness) compared with matching 
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local small structures by using small kernel scale to reduce 
registration (or deformation) bias error. This means that, the 
kernel scale should be large for the large structural regions 
to adequately smooth deformation field and be small for the 
small structural regions to accurately match image details. 

With this general scheme in mind, we use multi-resolution 
superpixel [ 30 1 1 1 8 1 representation for their preserving struc- 
ture boundaries to hierarchically segment the reference image 
into different saliency structural regions. The different scales 
for the different saliency structures in multi-resolution could 
be computed, by the local amount of Gaussian smoothing 
within the superpixel-represented saliency regions, in terms 
of variance in a scale-space framework. 

III. Methods 
A. Structure Scale Selection 

Firstly, we use an accurate superpixel representation called 
Simple Linear Iterative Clustering (SLIC) [30] to segment the 
reference image Ir(x) into several small structural regions 
(superpixels) that adhere to the saliency structure boundaries in 
the reference image. Therefore, these structural regions are the 
representation of underlying local saliency structures. Denote 
the image region as and local structures as Si, i = 1, • • • , n. 
Obviously, $ = [JLi S i- 

Simultaneously, a discrete scale space is constructed by 
diffusing Ir(x) with anisotropic diffusion equation ll3~Tl 



dI R „ k (x) 
dt 



C(x)A/ Ji (x) + VC(x)V7 /i (x) 



(2) 



where C(x) = exp(— (Jl^Miili) 2 ) j s th e diffusion coeffi- 
cient, and the subscript a k means a certain scale from a 
scale set £ = {&!,■•• ,cr m }. In this work, we assume the 
largest scale in the scale set is 15 pixels and the smallest 
one is 1 pixel. Because an image can be decomposed into 
a smoothed component and a residual component through 
anisotropic diffusion filter, the intensity on local superpixel 
Si can be represented by the smoothed component Ir^ (Si) 
and a residual component e ak (x) = Ir(x.) — Ir^ (x), x G Si. 
The residual component can be modeled as a random field with 
zero-mean Gaussian density. The objective of local structure 
scale selection is to assign a scale a k , k G {1, • • • , m} from £ 
for each local structure Sj such that the posterior probability 
P((T k \Si) = P(gfc jgf)' kfc) oc P(Si\a k ) = np(x\<T k ),x G Si, 
achieves maximum, where p{x\a k ) — P(lR(x.)\cr k ) is the 
likelihood of the observed image at each pixel x at scale a k , 
the P(Si\(Tk) is the likelihood of the observed structural region 
Si at scale a k . 

To estimate a likelihood of the observed image at each 
pixel, we use the well known MDL criterion [35| to relate 
the probability of an item with the length of the ideal code 
used to describe it, namely: 



P(I R \a k ) = 2- L ^^ 



(3) 



where L(lR\a k ) denotes the description length of Ir based 
on its decomposition at scale a k . This description length can 
be expressed as L(lR\a k ) = L^r^ ) +L(s (7k ). Furthermore, 
the description length of the smoothed component ) 



is assumed [19j to be inversely proportional to the a\ while 
the description length of the residual component L(e ak ) being 
proportional to the e 2 , k , the p(x\a k ) = P(I R (x)\ak) can be 
estimated by 



-B(- 



»)] 



(4) 



p(xK) =Ae "°t xe S . 

where A is the normalizing constant, B > and C > are 
the empirical parameters adjusting the impact of the smoothed 
component and the residual component, which are set to 1 in 
this work. 

By considering the scale coherence between neighboring 
local structures, the Markov Random Field (MRF) model is 
also implemented in this structure scale selection [18|[19|. As 
a result, the final structure scale selection is defined as 

a s = arg max P(a k | Sj)+ 

S{a k , ai) cxp(-(^j,(S l ) - ^(Sj)) 2 ) 



where 
S(a k ,ai) 



(5) 



1, if CTfc = <T[ 

0, otherwise 



n(Si) and n(Sj) are the mean intensity on S, and Sj, 
respectively. a k and u\ are the scales on Sj and Sj from the 
scale set E. In equation (4), the first term is the posterior 
probability on Si, the second term is a smoothness function 
of the local structure Si and its neighboring local structure Sj. 
The second term prefers same scale labeling for neighboring 
pairs of appearance-similar superpixel regions, and penalizes 
same scale labeling between neighboring pairs of appearance- 
different superpixel regions. The impact of MRF is controlled 
by the parameter A, which needs to be set to a small value 
(0.05) in order to avoid the over-smoothness that unintention- 
ally increases the structure scales of the local small structures. 

B. Mismatch Scale Calculation using JSM 

According to our previous work [13], we define a center- 
surround saliency operator based on the contrast among neigh- 
boring local structure tensors (LST). This contrast emphasizes 
the dissimilarity or discrepancy between neighboring local 
structure tensors. For a given point xo and its neighborhood 
0, the saliency value S(xo) at xo in a salient map can be 
computed through 



S(x ) -avgj] ||LST(x) -LST(x )|| D 



(6) 



xee 



where \\-\\d defines a distance metric describing the dissimilar- 
ity between two LSTs. The operator avg computes the average 
of the dissimilarities within the neighborhood O of Xo. The 
distance metric [32] between two tensors Ti and T 2 can be 
expressed as 



-2 \\D 



T 2 ||^-^Tr 2 (T 1 -T 2 )) (7) 



where ||T X - T 2 | 



is the Euclidean 



distance between two tensors {T l5 T 2 }. Tr (•) means the trace 
of a matrix. 
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After the two normalized salient maps were achieved to 
indicate the local saliency structure distribution, JSM was 
builded by describing the matching degree between the two 
saliency maps at every pixel pairs in the overlapping regions 
of the two images. Given a point in the reference image 
and its corresponding point xjvf in the moving image after 
initial transformation, their joint-saliency value in a JSM is 
defined as 



= min{5'_ R (x_ R ), S'm(xm)} 



FG 



G + ||LST(x H ) 



LST(x m )||d 
(8) 



where {Sr(-), Sm(')} denote the saliency values in the salient 
map of a reference image and a moving one, F = 10 and 
G = 5 max(||LST(xfl) — LST(xj\/) \\n are two empirical 
parameters used to bound the final JSM values between 
and 1. Note that it may introduce a situation that both of 
the corresponding pixels are assigned high saliency values 
in the structure-tensor based saliency maps, while their local 
variations of gradient orientations are in fact totally different. 
To avoid this situation, we also consider the dissimilarity 
measure between LST(x^) and LST(xm) at the denominator 
in equation (7). 

Due to the JSM representing the degree of matching be- 
tween the underlying saliency structure pairs, the mismatch 
scales should be inversely proportional to JSM values. There- 
fore, a zero or very small mismatch scale value is assigned 
to the corresponding structural regions with high JSM value 
in the multi-resolution registration context, while a large mis- 
match scale value is given to the unmatched structural regions 
with low value in JSM. Besides, owing to the low contribution 
to the nonrigid registration based on kernel regression, the 
mismatch scales in background or homogeneous regions are 
set to zero. According to the aforementioned idea, we defined 
the mismatch scale tr m as 



0, i/x£ background regions/ 

i(x) = ^ homogeneous regions 

1/JS, otherwise 



(9) 



By this definition, the JSM map are transformed to the 
mismatch scale map that is used in the next step of JSS and 
local scale adaptive kernel regression. 

C. Local Adaptive Scale for JSS Adaptive Kernel Regression 

Given the structure scale a s and the mismatch scale a m , 
we are ready to design the local kernel scale ad as 



a d = max{ a s x a m , 1} 



(10) 



where 1 avoids the local kernel scale being less than 1 pixel. 

Fig. 2 illustrates the JSM, structure scale map, mismatch 
scale map and local kernel scale map at the multi-resolution 
scheme with the color scale representing different normalized 
joint saliency values or scale values. Fig. 2(a)-(b) are the 
reference and moving images with the 384 x 288 pixels 
resolution. Fig. 2(c) is the JSM for the 192 x 144 pixels 
resolution of images. Fig. 2(d)-(f) display the structure scale, 




Fig. 2. Flower images and their multi-resolution JSM, structure 
scale maps, mismatch scale maps and kernel scale maps, (a)- 
(b) The reference and moving flower images at the 384 x 288 pixels 
resolution, (c) JSM at the 192 x 144 resolution, (d)-(f) structure scale, 
mismatch scale and kernel scale maps for the finest 384 x 288 pixels 
resolution of images, (g)-(i) structure scale, mismatch scale and kernel 
scale maps for the 192 x 144 pixels resolution of images, (j)-(l) 
structure scale, mismatch scale and kernel scale maps for the 96 x 72 
pixels resolution of images. 



mismatch scale and kernel scale maps for the finest resolution 
of images. Fig. 2(g)-(i) and Fig. 2(j)-(l) display the structure 
scale, mismatch scale and kernel scale maps for the 192 x 144 
and 96 x 72 pixels resolutions of images, respectively. 

From the left column of Fig. 2, the multi-resolution super- 
pixel based structure scale map has demonstrated its success 
in determining the structure scales of the saliency structures in 
the multi-resolution images. Specifically, it roughly segments 
the foreground structural regions and does not segment the 
small structures at the coarse resolution (e.g., the stamen 
filament in the upper right comer of the reference image), 
so that there are more moderate structure scales (within the 
superpixels of large size) than the maximal and minimal 
scales being presented in the structure scale maps. With the 
increasing image resolution reducing the size of superpixels 
and enhancing the image details, the structure scale maps can 
precisely recognize the small structures (e.g., the small petals, 
the petal boundaries and the stamen filament at Fig. 2(g), (d)) 
such that the small structure scales are appropriately assigned 
to these small structures while the maximal structure scales 
being displayed at background or homogeneous regions. 

The foreground large smooth structural regions in the 
coarse resolution initialize the sparse deformation vectors for 
subsequent multi-resolution nonrigid registration, while the 
small saliency structures displaying saliency details at the fine 
resolution refine the sparse deformation vectors. With these 
small structures being gradually joined in the iterative kernel 
regression, the background and homogeneous foreground re- 
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gions achieve large structure scale, zero mismatch scale, and 
the smallest kernel scale (1) (Fig. 2(d)-(f) and Fig. 2(g)-(i)). 
The small saliency structures in the fine resolution have small 
structure scales, their final kernel scales are dependent on their 
mismatch scales (Fig. 2(e), (h), (k)); the locally matched small 
saliency structures with fine saliency details have large joint 
saliency values and very small mismatch scales, while the 
mismatched small saliency structures have very small joint 
saliency values and large mismatch scales (especially at the 
the missing correspondence and local large deformations in 
the upper right corner of images). Consequently, the matched 
small structures have small kernel scales while the mismatched 
saliency structures having relative large kernel scales in the 
fine resolution (see Fig. 2(f), (i), (1)). According to the afore- 
mentioned analysis in the multi-resolution scheme, the moving 
image's local saliency structures are gradually matched into 
the corresponding reference structures by iteratively selecting 
the structure scales and mismatch scales for the JSS & local 
scale adaptive kernel regression. 

IV. Experimental Results 

In this section, we use a set of 2D image pairs to validate 
the performance of the proposed method through comparing 
with our previous method lfl2l . Advanced Normalized Tools 
(ANTsfl with elastic transformation and Mutual information 
(AMI) E3, AMI with cost-function Masking (AMM), Dif- 
feomorphic Demons with Diffusion-like regularization (DDD) 
ll34l in Medical Image Processing, Analysis, and Visualization 
(MIPAvjl and fast B-Spline with MI (BMI) in 3D Sliced The 
parameters of our two methods are: the number of pyramid 
levels is 5; the local similarity measure is mutual information. 
We set the parameters of AMI and AMM as: the histogram 
bin size is 32; the number of pyramid levels is 3; the iterations 
are set to 100 x 100 x 10; the gradient step is 10; the default 
regularization is Gaussian filtering with a sigma of 3. The 
parameters of DDD method are set as follows: the variance 
of smoothing kernels is 2; the step scale is 1; the number 
of pyramid levels is 5; the number of iterations is 100. The 
parameters of the BMI method are selected as: the number 
of iterations is set to 100; the grid size is 15; the histogram 
bin size is 32; the spatial sample is 50000; the maximum 
deformation is 20. With those parameters all the methods 
mentioned above achieve their best performances. 

To evaluate the performance of the six competing methods, 
we not only zoom in some local small structures in the 
registered moving images and display their deviation from 
desired spatial positions located by several red crosses but 
also measure the registration errors at densely distributed 
landmarks selected by an expert. The landmark selection fully 
excludes outlier features while paying more attention to the 
identifiable locations at the local small saliency structures. 
Subsequently, the average landmark-based registration error 
distances and corresponding standard deviations of the six 
methods in every case are listed. Lower average error distances 

'http://www.picsl.upenn.edu/ANTs 

2 http ://mipav. cit. nih.gov 

3 http ://w ww. sheer, org 
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Fig. 3. Brain tumor resection image registration, (a)-(b) The 
reference and moving images, (c) the proposed method, (d) the 
previous method, (e) AMI, (f) AMM, (g) DDD, (h) BMI, (i)-(p) 
the same sulcus in (a)-(h) with desired spatial positions located by 
red cross, (q) structure scale map, (r) JSM, (s) mismatch scale map, 
(t) kernel scale map. 



and lower standard deviation imply more accurate alignment 
of local structures. 

The first experiment involves matching pre- and post- 
operative brain tumor resection images. Brain tissue severely 
suppressed by tumor in the preoperative image (Fig. 3(a)(b)) 
expands after tumor resection, which introduces not only the 
missing correspondence of tumor in the post-operative images 
but also the local large deformations caused by the brain 
shift. Fig. 3(c)-(h) are the registered results of the proposed 
method, the previous method, AMI, AMM, DDD and BMI. 
Visual inspection has revealed that the proposed method, the 
previous method, AMI and AMM methods apparently perform 
better than the DDD and BMI methods because the local brain 
deformation resulted from the latter two methods is either 
insufficient or somewhat excessive. The deformations of the 
sulcus near the missing corresponding tumor region in Fig. 
3(a)-(h) are emphatically illustrated in Fig. 3(i)-(p). Comparing 
Fig. 3(k) with Fig. 3(1) shows the improvement of the proposed 
method to the previous method in registration accuracy. The 
structure scale map, JSM, mismatch scale map and kernel scale 
maps at the finest resolution of the images for the proposed 
method are shown in Fig. 3(q)-(t). 

Fig. 4 displayed flower image registration cases, where the 
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Fig. 4. Flower image registration, (a)-(b) The reference and moving 
images, (c) the proposed method, (d) the previous method, (e) AMI, 
(f) AMM, (g) DDD, (h) BMI, (i)-(p) the same stamen filament in (a)- 
(h) with desired spatial positions located by red cross, (q) structure 
scale map, (r) JSM, (s) mismatch scale map, (t) kernel scale map. 



stamen filament in the right part of the reference image (Fig. 
4(a)) is driven by the movement of the center flowers. In 
addition, some buds behind the stamen filament in the moving 
image (Fig. 4(b)) disappear in the reference image. A desired 
registered result of this case should properly deform the 
stamen filament according to the reference image regardless 
of the missing buds and flowers with large deformation. 
Fig. 4(c)-(h) show the registered results of the six methods, 
where the proposed method, our previous method and BMI 
outperform the other three method from the visual evaluation. 
However, the enlarged images (Fig. 4(i)-(p)) of the same 
stamen filament demonstrate that all the approaches except the 
proposed method fail to deform the stamen filament accurately. 
Fig. 4(p)-(t) show the structure scale map, JSM, mismatch 
scale map and kernel scale maps of the proposed method in 
this case. 

A more challenging experiment is shown in Fig. 5. With 
the distortion of the hat, all the alphabets in the hat deform 
as well, especially the black stripes in 'E' having local large 
deformation. Besides, the missing T in the reference image 
(Fig. 5(a)) appears in the moving image (Fig. 5(b)). The main 
difficulty in this experiment lies in the reasonable alignment 
of local small scale structures such as the stripes in 'E'. 
Because too many tiny structures are close to each other, 
mismatching in one structure will subsequently affect the 
deformation of its neighboring local tiny structures, and thus 
lead to poor structure alignment in a certain region. Fig. 5(c)- 
(h) show the registered moving images by the six methods. The 
enlarged images in the stripes of 'E' are displayed in Fig. 5(i)- 
(p). Comparatively, only the proposed and the BMI method 
preserve the matching accuracy of the stripes. The structure 




Fig. 5. Hat image registration, (a)-(b) The reference and moving 
images, (c) the proposed method, (d) the previous method, (e) AMI, 
(f) AMM, (g) DDD, (h) BMI, (i)-(p) the same stripes in (a)-(h) with 
desired spatial positions located by red cross, (q) structure scale map, 
(r) JSM, (s) mismatch scale map, (t) kernel scale map. 

TABLE I 

Landmark Registration errors (Mean+SD) of the six 
methods for the three grayscale image 
registrations 

proposed previous ATvII A"MM DDD BMI 

f3.96±1.83 — 0.97±1.91 0.95±1.48 1.16±1.99 1.60±3.08 LT5IT.89 

0.93±3.01 1.14±2.96 1.69±3.49 8.38±8.82 4.42±5.65 1.55±3.49 
0.87±1.01 0.91±1.25 1.36±1.01 2.75±3.63 4.26±4.67 2.50±4.05 



scale map, JSM, mismatch scale map and kernel scale maps 
of the proposed method are shown in Fig. 5(q)-(t). 

Tab. 1 compares the average registration errors and the 
corresponding standard deviations of the 38 landmarks, 20 
landmarks and 40 landmarks manually selected in the brain 
image, flower image and hat image registration. The pro- 
posed method has achieved sub-pixel registration accuracy 
for all the three experiments with the registration errors 
of (0.96±1.83, 0.93±3.01, 0.87±1.01), while the registra- 
tion errors of the previous and AMI methods approximately 
keep (0.97±1.91, 1.14±2.96, 0.91±1.25) and (0.95±1.48, 
1.69±3.49, 1.36±1.01). The other three methods can not 
perform well in these three challenging image registrations. 
In general, compared with the other state-of-the-art methods, 
only our method has achieved satisfying sub-pixel registration 
accuracy for all these challenging image registrations with 
outliers. 
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V. Conclusion 

In this paper, by designing the JSM-driven kernel scale 
estimator in aligning local structures with missing corre- 
spondence and local large deformation, we improve the pre- 
viously proposed nonrigid registration using JSS adaptive 
kernel regression to achieve accurate local structure matching. 
Specifically, for every local structures to be registered, we 
combine their mismatch scale characterized by the JSM with 
their intrinsic structure scale so that the kernel function can 
adaptively control the size of sparse displacement vector 
samples participating in the JSS adaptive kernel regression for 
nonrigid image registration. 

As indicated in (5j, the iterative approach can improve 
the nonparametric estimates. In our work, the corresponding 
adaptive kernel scale selection, the kernel shape adaptivity 
and JSS-based weighted estimation lfl3l iteratively deployed 
by the multi-resolution JSS adaptive kernel regression are 
significantly effective to improve the performance of local 
structure matching in the nonrigid image registration. The 
proposed method achieves the continuous and locally smooth 
transformation for accurately matching the local small struc- 
tures with outliers and is favorably compared with state- 
of-the-art methods both in cases of missing correspondence 
and local large deformations. It is important to note that the 
computational cost of the proposed method is expensive even 
for 2D image registration, though it is easy to extend the 
proposed algorithm to 3D image registration. For reducing 
the computation burden of our approach even for the future 
3D/4D nonrigid image registration, fast method is required to 
estimate the local adaptive structure scales, mismatch scale as 
well as discrete local structure-adaptive Gaussian kernels |36| 
and implement structural adaptive kernel regression at every 
voxel. 
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